Cleaning

Exploratory analysis: What’s in the AFT corpus?

Tale Types

  • By ATU Chapter/Division (n)
  • Proportion of ATU represented by aft
  • Find higher-order morphology in AFT. morphometric = set of motif groups that are combined in a tale type

By ATU Chapter

chapter n_types n_tales tales_per
TALES OF MAGIC 240 607 2.529167
OTHER ANIMALS AND OBJECTS 50 104 2.080000
OTHER TALES OF THE SUPERNATURAL 36 72 2.000000
FORMULA TALES 53 93 1.754717
ANIMAL TALES 332 546 1.644578
RELIGIOUS TALES 543 782 1.440147
ANECDOTES AND JOKES 993 1218 1.226586

Content

Entities

Common phrases using:

  • TextRank
  • collocation/word frequency

Topic modeling

  • Define cleaning tasks and stop words to improve topic models performance; right now they are too close together, with a few main clusters of topics that are difficult to distinguish